promoting coordination
Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on inter-agent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination as well as on the discrete action Google Research Football environment. Our experiments show improved performance across many cooperative multi-agent problems. Finally, we analyze the effects of our proposed methods on the policies that our agents learn and show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors.
Review for NeurIPS paper: Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
Summary and Contributions: Based on rebuttal and discussion: Upon reading all reviews, I recognize that we agree the article is well presented, and I stand by the concerns I raised. Note that I primarily criticized the absence of some relevant context in the original submission (which the authors admit in their rebuttal), rather than the contribution itself (albeit it may be smaller than proclaimed). Their refutation of it being a planning setting is fair. While I maintain that it is a self-play setting, this is implied by CTDE and thus not necessary to state again. A stale flavor remains from overselling their contribution's novelty in the introduction [L36-45].
Review for NeurIPS paper: Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
Originally, there was some disagreement between reviewers on this paper, but after rebuttal and careful discussion between reviewers and AC, all agree that the paper is interesting and has merit and could be proposed for acceptance as poster. One critical reviewer now recognises that the predictability idea is neat and the concern about positioning of the work has been largely clarified. Reviewers agree there is a contribution to joint exploration in MAS, which is one of the bottlenecks that deserve being addressed and discussed.
Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on inter-agent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination as well as on the discrete action Google Research Football environment.